home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
- Network Working Group H. Alvestrand
- Request for Comments: 1766 UNINETT
- Category: Standards Track March 1995
-
-
- Tags for the Identification of Languages
-
- Status of this Memo
-
- This document specifies an Internet standards track protocol for the
- Internet community, and requests discussion and suggestions for
- improvements. Please refer to the current edition of the "Internet
- Official Protocol Standards" (STD 1) for the standardization state
- and status of this protocol. Distribution of this memo is unlimited.
-
- Abstract
-
- This document describes a language tag for use in cases where it is
- desired to indicate the language used in an information object.
-
- It also defines a Content-language: header, for use in the case where
- one desires to indicate the language of something that has RFC-822-
- like headers, like MIME body parts or Web documents, and a new
- parameter to the Multipart/Alternative type, to aid in the usage of
- the Content-Language: header.
-
- 1. Introduction
-
- There are a number of languages spoken by human beings in this world.
-
- A great number of these people would prefer to have information
- presented in a language that they understand.
-
- In some contexts, it is possible to have information in more than one
- language, or it might be possible to provide tools for assisting in
- the understanding of a language (like dictionaries).
-
- A prerequisite for any such function is a means of labelling the
- information content with an identifier for the language in which is
- is written.
-
- In the tradition of solving only problems that we think we
- understand, this document specifies an identifier mechanism, and one
- possible use for it.
-
-
-
-
-
-
-
- Alvestrand [Page 1]
-
- RFC 1766 Language Tag March 1995
-
-
- 2. The Language tag
-
- The language tag is composed of 1 or more parts: A primary language
- tag and a (possibly empty) series of subtags.
-
- The syntax of this tag in RFC-822 EBNF is:
-
- Language-Tag = Primary-tag *( "-" Subtag )
- Primary-tag = 1*8ALPHA
- Subtag = 1*8ALPHA
-
- Whitespace is not allowed within the tag.
-
- All tags are to be treated as case insensitive; there exist
- conventions for capitalization of some of them, but these should not
- be taken to carry meaning.
-
- The namespace of language tags is administered by the IANA according
- to the rules in section 5 of this document.
-
- The following registrations are predefined:
-
- In the primary language tag:
-
- - All 2-letter tags are interpreted according to ISO standard
- 639, "Code for the representation of names of languages" [ISO
- 639].
-
- - The value "i" is reserved for IANA-defined registrations
-
- - The value "x" is reserved for private use. Subtags of "x"
- will not be registered by the IANA.
-
- - Other values cannot be assigned except by updating this
- standard.
-
- The reason for reserving all other tags is to be open towards new
- revisions of ISO 639; the use of "i" and "x" is the minimum we can do
- here to be able to extend the mechanism to meet our requirements.
-
- In the first subtag:
-
- - All 2-letter codes are interpreted as ISO 3166 alpha-2
- country codes denoting the area in which the language is
- used.
-
- - Codes of 3 to 8 letters may be registered with the IANA by
- anyone who feels a need for it, according to the rules in
-
-
-
- Alvestrand [Page 2]
-
- RFC 1766 Language Tag March 1995
-
-
- chapter 5 of this document.
-
- The information in the subtag may for instance be:
-
- - Country identification, such as en-US (this usage is
- described in ISO 639)
-
- - Dialect or variant information, such as no-nynorsk or en-
- cockney
-
- - Languages not listed in ISO 639 that are not variants of
- any listed language, which can be registered with the i-
- prefix, such as i-cherokee
-
- - Script variations, such as az-arabic and az-cyrillic
-
- In the second and subsequent subtag, any value can be registered.
-
- NOTE: The ISO 639/ISO 3166 convention is that language names are
- written in lower case, while country codes are written in upper case.
- This convention is recommended, but not enforced; the tags are case
- insensitive.
-
- NOTE: ISO 639 defines a registration authority for additions to and
- changes in the list of languages in ISO 639. This authority is:
-
- International Information Centre for Terminology (Infoterm)
- P.O. Box 130
- A-1021 Wien
- Austria
- Phone: +43 1 26 75 35 Ext. 312
- Fax: +43 1 216 32 72
-
- The following codes have been added in 1989 (nothing later): ug
- (Uigur), iu (Inuktitut, also called Eskimo), za (Zhuang), he (Hebrew,
- replacing iw), yi (Yiddish, replacing ji), and id (Indonesian,
- replacing in).
-
- NOTE: The registration agency for ISO 3166 (country codes) is:
-
- ISO 3166 Maintenance Agency Secretariat
- c/o DIN Deutches Institut fuer Normung
- Burggrafenstrasse 6
- Postfach 1107
- D-10787 Berlin
- Germany
- Phone: +49 30 26 01 320
- Fax: +49 30 26 01 231
-
-
-
- Alvestrand [Page 3]
-
- RFC 1766 Language Tag March 1995
-
-
- The country codes AA, QM-QZ, XA-XZ and ZZ are reserved by ISO 3166 as
- user-assigned codes.
-
- 2.1. Meaning of the language tag
-
- The language tag always defines a language as spoken (or written) by
- human beings for communication of information to other human beings.
- Computer languages are explicitly excluded.
-
- There is no guaranteed relationship between languages whose tags
- start out with the same series of subtags; especially, they are NOT
- guraranteed to be mutually comprehensible, although this will
- sometimes be the case.
-
- Applications should always treat language tags as a single token; the
- division into main tag and subtags is an administrative mechanism,
- not a navigation aid.
-
- The relationship between the tag and the information it relates to is
- defined by the standard describing the context in which it appears.
- So, this section can only give possible examples of its usage.
-
- - For a single information object, it should be taken as the
- set of languages that is required for a complete
- comprehension of the complete object. Example: Simple text.
-
- - For an aggregation of information objects, it should be taken
- as the set of languages used inside components of that
- aggregation. Examples: Document stores and libraries.
-
- - For information objects whose purpose in life is providing
- alternatives, it should be regarded as a hint that the
- material inside is provided in several languages, and that
- one has to inspect each of the alternatives in order to find
- its language or languages. In this case, multiple languages
- need not mean that one needs to be multilingual to get
- complete understanding of the document. Example: MIME
- multipart/alternative.
-
- - It would be possible to define (for instance) an SGML DTD
- that defines a <LANG xx> tag for indicating that following or
- contained text is written in this language, such that one
- could write "<LANG FR>C'est la vie</LANG>"; the Norwegian-
- speaking user could then access a French-Norwegian dictionary
- to find out what the quote meant.
-
-
-
-
-
-
- Alvestrand [Page 4]
-
- RFC 1766 Language Tag March 1995
-
-
- 3. The Content-language header
-
- The Language header is intended for use in the case where one desires
- to indicate the language(s) of something that has RFC-822-like
- headers, like MIME body parts or Web documents.
-
- The RFC-822 EBNF of the Language header is:
-
- Language-Header = "Content-Language" ":" 1#Language-tag
-
- Note that the Language-Header is allowed to list several languages in
- a comma-separated list.
-
- Whitespace is allowed, which means also that one can place
- parenthesized comments anywhere in the language sequence.
-
- 3.1. Examples of Content-language values
-
- NOTE: NONE of the subtags shown in this document have actually been
- assigned; they are used for illustration purposes only.
-
- Norwegian official document, with parallel text in both official
- versions of Norwegian. (Both versions are readable by all
- Norwegians).
-
- Content-Type: multipart/alternative;
- differences=content-language
- Content-Language: no-nynorsk, no-bokmaal
-
- Voice recording from the London docks
-
- Content-type: audio/basic
- Content-Language: en-cockney
-
- Document in Sami, which does not have an ISO 639 code, and is spoken
- in several countries, but with about half the speakers in Norway,
- with six different, mutually incomprehensible dialects:
-
- Content-type: text/plain; charset=iso-8859-10
- Content-Language: i-sami-no (North Sami)
-
- An English-French dictionary
-
- Content-type: application/dictionary
- Content-Language: en, fr (This is a dictionary)
-
- An official EC document (in a few of its official languages)
-
-
-
-
- Alvestrand [Page 5]
-
- RFC 1766 Language Tag March 1995
-
-
- Content-type: multipart/alternative
- Content-Language: en, fr, de, da, el, it
-
- An excerpt from Star Trek
-
- Content-type: video/mpeg
- Content-Language: x-klingon
-
- 4. Use of Content-Language with Multipart/Alternative
-
- When using the Multipart/Alternative body part of MIME, it is
- possible to have the body parts giving the same information content
- in different languages. In this case, one should put a Content-
- Language header on each of the body parts, and a summary Content-
- Language header onto the Multipart/Alternative itself.
-
- 4.1. The differences parameter to multipart/alternative
-
- As defined in RFC 1541, Multipart/Alternative only has one parameter:
- boundary.
-
- The common usage of Multipart/Alternative is to have more than one
- format of the same message (f.ex. PostScript and ASCII).
-
- The use of language tags to differentiate between different
- alternatives will certainly not lead all MIME UAs to present the most
- sensible body part as default.
-
- Therefore, a new parameter is defined, to allow the configuration of
- MIME readers to handle language differences in a sensible manner.
-
- Name: Differences
- Value: One or more of
- Content-Type
- Content-Language
-
- Further values can be registered with IANA; it must be the name of a
- header for which a definition exists in a published RFC. If not
- present, Differences=Content-Type is assumed.
-
- The intent is that the MIME reader can look at these headers of the
- message component to do an intelligent choice of what to present to
- the user, based on knowledge about the user preferences and
- capabilities.
-
- (The intent of having registration with IANA of the fields used in
- this context is to maintain a list of usages that a mail UA may
- expect to see, not to reject usages.)
-
-
-
- Alvestrand [Page 6]
-
- RFC 1766 Language Tag March 1995
-
-
- (NOTE: The MIME specification [RFC 1521], section 7.2, states that
- headers not beginning with "Content-" are generally to be ignored in
- body parts. People defining a header for use with "differences="
- should take note of this.)
-
- The mechanism for deciding which body part to present is outside the
- scope of this document.
-
- MIME EXAMPLE:
-
- Content-Type: multipart/alternative; differences=Content-Language;
- boundary="limit"
- Content-Language: en, fr, de
-
- --limit
- Content-Language: fr
-
- Le renard brun et agile saute par dessus le chien paresseux
- --limit
- Content-Language: de
- Content-Type: text/plain; charset=iso-8859-1
- Content-Transfer-encoding: quoted-printable
-
- Der schnelle braune Fuchs h=FCpft =FCber den faulen Hund
- --limit
- Content-Language: en
-
- The quick brown fox jumps over the lazy dog
- --limit--
-
- When composing a message, the choice of sequence may be somewhat
- arbitrary. However, non-MIME mail readers will show the first body
- part first, meaning that this should most likely be the language
- understood by most of the recipients.
-
- 5. IANA registration procedure for language tags
-
- Any language tag must start with an existing tag, and extend it.
-
- This registration form should be used by anyone who wants to use a
- language tag not defined by ISO or IANA.
-
-
-
-
-
-
-
-
-
-
- Alvestrand [Page 7]
-
- RFC 1766 Language Tag March 1995
-
-
- ----------------------------------------------------------------------
- LANGUAGE TAG REGISTRATION FORM
-
- Name of requester :
- E-mail address of requester:
- Tag to be registered :
-
- English name of language :
-
- Native name of language (transcribed into ASCII):
-
- Reference to published description of the language (book or article):
- ----------------------------------------------------------------------
-
- The language form must be sent to <ietf-types@uninett.no> for a 2-
- week review period before submitting it to IANA. (This is an open
- list. Requests to be added should be sent to <ietf-types-
- request@uninett.no>.)
-
- When the two week period has passed, the language tag reviewer, who
- is appointed by the IETF Applications Area Director, either forwards
- the request to IANA@ISI.EDU, or rejects it because of significant
- objections raised on the list.
-
- Decisions made by the reviewer may be appealed to the IESG.
-
- All registered forms are available online in the directory
- ftp://ftp.isi.edu/in-notes/iana/assignments/languages/
-
- 6. Security Considerations
-
- Security issues are not discussed in this memo.
-
- 7. Character set considerations
-
- Codes may always be expressed using the US-ASCII character repertoire
- (a-z), which is present in most character sets.
-
- The issue of deciding upon the rendering of a character set based on
- the language tag is not addressed in this memo; however, it is
- thought impossible to make such a decision correctly for all cases
- unless means of switching language in the middle of a text are
- defined (for example, a rendering engine that decides font based on
- Japanese or Chinese language will fail to work when a mixed
- Japanese-Chinese text is encountered)
-
-
-
-
-
-
- Alvestrand [Page 8]
-
- RFC 1766 Language Tag March 1995
-
-
- 8. Acknowledgements
-
- This document has benefited from innumberable rounds of review and
- comments in various fora of the IETF and the Internet working groups.
- As so, any list of contributors is bound to be incomplete; please
- regard the following as only a selection from the group of people who
- have contributed to make this document what it is today.
-
- In alphabetical order:
-
- Tim Berners-Lee, Nathaniel Borenstein, Jim Conklin, Dave Crocker,
- Ned Freed, Tim Goodwin, Olle Jarnefors, John Klensin, Keith Moore,
- Masataka Ohta, Keld Jorn Simonsen, Rhys Weatherley, and many, many
- others.
-
- 9. Author's Address
-
- Harald Tveit Alvestrand
- UNINETT
- Pb. 6883 Elgeseter
- N-7002 TRONDHEIM
- NORWAY
-
- EMail: Harald.T.Alvestrand@uninett.no
- Phone: +47 73 59 70 94
-
- 10. References
-
- [ISO 639]
- ISO 639:1988 (E/F) - Code for the representation of names of
- languages - The International Organization for
- Standardization, 1st edition, 1988 17 pages Prepared by
- ISO/TC 37 - Terminology (principles and coordination).
-
- [ISO 3166]
- ISO 3166:1988 (E/F) - Codes for the representation of names
- of countries - The International Organization for
- Standardization, 3rd edition, 1988-08-15.
-
- [RFC 1521]
- Borenstein, N., and N. Freed, "MIME Part One: Mechanisms for
- Specifying and Describing the Format of Internet Message
- Bodies", RFC 1521, Bellcore, Innosoft, September 1993.
-
- [RFC 1327]
- Kille, S., "Mapping between X.400(1988) / ISO 10021 and RFC
- 822", RFC 1327, University College London, May 1992.
-
-
-
-
- Alvestrand [Page 9]
-
-